Intro

One of the defining principles of Social Network Analysis is that it draws heavily on graphic imagery. Hence, for performing those analysis, it is crucial to learn how to properly depict the data. For this tutorial, I will – again – use the dataset containing Congress members’ agreement on abortion bills. First, I need to read in the data and create the tbl_graph.

library(tidyverse)
library(tidygraph)
library(ggraph)

abortion_nodes <- read_csv("data/abortion-bills/nodelist_abortion_bills.csv",
                           col_types = cols(
                             id = col_character(), # node names need to be characters
                             female = col_integer(),
                             democrat = col_integer())) %>% 
  mutate(party = case_when(democrat == 1 ~ "Democrat",
                           democrat == 0 ~ "Republican"),
         party = as_factor(party),
         gender = case_when(female == 1 ~ "female",
                            female == 0 ~ "male"),
         gender = as_factor(gender)) %>% 
  select(-democrat, -female)

abortion_edges <- read_csv("data/abortion-bills/edgelist_abortion_bills.csv",
                           col_types = cols(
                             from = col_character(), # node names need to be characters
                             to = col_character(), # node names need to be characters
                             weight = col_double()
                             )) %>%
  group_by(from, to) %>% 
  summarize(weight = sum(weight)) %>% 
  mutate(weight = as.integer(floor(weight))) %>% 
  filter(weight > 0)

sample_edges <- abortion_edges %>% 
  slice(sample(nrow(.), 1000))
sample_nodes <- abortion_nodes %>% 
  filter(id %in% sample_edges$from | id %in% sample_edges$to)

abortion_graph <- tbl_graph(nodes = sample_nodes,
                            edges = sample_edges)

Now that the tbl_graph is created, you might want to get a quick overview of it. You can achieve this using autograph():

autograph(abortion_graph)

In this case, the network is way too dense to display it in a meaningful way using autograph(). I could probably change some arguments in autograph. However, doing some proper ggraph manipulation might be easier – so I will do it using proper ggraph.

Maybe some of you are familiar with the “layered grammar of graphics” [see, for instance, Wickham (2010) for more information). The probably most popular R package for visualizing data, ggplot2, is making good use of it. In a nutshell, every plot consists of layers and every layer adds a new feature to it. An example for our graph with a nice layout (“stress”), nodes colored according to their party affiliation (Democrats are blue, Republicans red), and the edge alpha (their transparency) according to their edge weights would look like this:

ggraph(abortion_graph) +
  labs(caption = "Basic plot")
## Using `stress` as default layout

ggraph(abortion_graph) +
  geom_edge_link0(aes(edge_alpha = weight), edge_color = "grey66") +
  labs(caption = "Basic plot + edges")
## Using `stress` as default layout

ggraph(abortion_graph) +
  geom_edge_link0(aes(edge_alpha = weight), edge_color = "grey66") +
  geom_node_point(aes(fill = party), shape = 21) +
  labs(caption = "Basic plot + edges + nodes")
## Using `stress` as default layout

ggraph(abortion_graph) +
  geom_edge_link0(aes(edge_alpha = weight), edge_color = "grey66") +
  geom_node_point(aes(fill = party), shape = 21) +
  scale_fill_manual(values = c(Republican = "red", Democrat = "blue")) +
  labs(caption = "Basic plot + edges + nodes + colors aligned to party")
## Using `stress` as default layout

So, when plotting networks with ggraph, you basically need to provide it at least three things: the layout you want to use, what you want to do with the edges, and what you want to do with the nodes.

Layouts

A layout basically determines where the different nodes are placed on the x and y axis. It is determined by a layout algorithm. Which one to choose is largely based on the graph you want to depict. However, the default choice is layout = "stress" which produces fairly nice layouts for the most graphs. Hence, you should not worry too much about it. You usually provide the layout in the initial ggraph call.

ggraph(abortion_graph, layout = "fr") +
  geom_edge_link0(aes(edge_alpha = weight), edge_color = "grey66") +
  geom_node_point(aes(fill = party), shape = 21) +
  labs(caption = "Basic plot + edges + nodes")

The GIF here shows a couple of layouts. It was created by Thomas Lin Pedersen, the author of the ggraph package.

Layouts

Layouts

geom_edge_*

After you have made the initial ggraph call providing the graph object and the layout, you need to tell ggraph how to go across the edges – using an edge geom. geom stands for geometric object. It is the first layer you need to add to your ggraph function. Order matters: if you add the node geom first and the edge geom afterwards, the nodes will be printed underneath the edges.

For your first graphs – and probably 80 percent of graphs you will ever want to plot –, geom_edge_link0() will totally suffice. It draws straight lines between two points.

Aesthetic mappings describe how variables in the data are mapped to visual properties (aesthetics) of geoms. Hence, they are plots parameters that are given by data. There are a couple of arguments you can provide to either aesthetics (by using aes() inside of the geom_edge_link0()) or globally (by just putting it inside geom_edge_link0().

The arguments are as follows:

ggraph(abortion_graph) +
  geom_edge_link0(aes(edge_color = weight)) # continuous scale
## Using `stress` as default layout

ggraph(abortion_graph) +
  geom_edge_link0(aes(edge_color = as_factor(weight))) # discrete scale
## Using `stress` as default layout

ggraph(abortion_graph) +
  geom_edge_link0(aes(edge_width = weight)) # continuous scale --> does not make sense here…
## Using `stress` as default layout

# ggraph(abortion_graph) +
#   geom_edge_link0(aes(edge_linetype = weight), edge_color = "grey66") # continuous scale -- linetype only works with categorical variables
ggraph(abortion_graph) +
  geom_edge_link0(aes(edge_linetype = as_factor(weight)), edge_color = "grey66") # here we go
## Using `stress` as default layout

ggraph(abortion_graph) +
  geom_edge_link0(aes(edge_alpha = weight), edge_color = "grey66")
## Using `stress` as default layout

geom_node_*

However, the plots here did not really make sense as there were no nodes in them. They can be added using geom_node_point() or, if you want node names to be included, geom_node_text().

For geom_node_point() you need to choose a form out of this overview (which was obtained from a blog post by David Schoch):

!(Forms)[http://www.sthda.com/sthda/RDoc/images/points-symbols.png]

These aesthetics can be used within geom_node_point(). You can, again, put them either within aes() – if they should depend on the data at hand – or globally:

ggraph(abortion_graph) +
  geom_edge_link0(color = "grey66") +
  geom_node_point(aes(color = party, shape = gender))
## Using `stress` as default layout

The most important aesthetics for geom_node_text():

ggraph(abortion_graph) +
  geom_edge_link0(color = "grey66") +
  geom_node_text(aes(label = party, color = gender), family = "Times", size = 2)
## Using `stress` as default layout

Another handy thing is that you can filter things in geom_node_text. You can, for instance, see only the names of female Democrats. Besides, you can also add node points, of course. But make sure that they come before the labels (and after the edges).

ggraph(abortion_graph) +
  geom_edge_link0(color = "grey66") +
  geom_node_point(aes(color = party)) +
  geom_node_text(aes(filter = (party == "Democrat" & gender == "female"), label = id), family = "Times", size = 2)
## Using `stress` as default layout

scale_*

But what if you want to influence the things which are in the aes()? This is where the scale_* functions come in handy.

Which one to choose is based on their names’ structure – which is as follows: scale_<aes>_<variable type>(). at least for nodes. In case of edges, put a edge_ in front of <aes>scale_edge_<aes>_<variable type>().

<aes> is straight-forward: it refers to what was specified in aes() – e.g., color.

Variable t<pe, on the other hand, is not that intuitive. As the name implies, it depends on the type of variable which is used in the aes().

If it is a categorical variable, _manual can be added. In the case of the node colors, you can simply provide a character vector that contains the colors. If you want full control, provide a named vector (names are the categories of your categorical variable):1

ggraph(abortion_graph) +
  geom_edge_link0(aes(edge_alpha = weight), edge_color = "grey66") +
  geom_node_point(aes(fill = party), shape = 21) +
  scale_fill_manual(values = c(Republican = "red", Democrat = "blue"))
## Using `stress` as default layout

For continuous variables, add _continuous. Here, the most important argument is range. It defines the minimum and maximum value. This is especially suited for width and size.

ggraph(abortion_graph) +
  geom_edge_link0(aes(edge_width = weight), edge_color = "grey66") +
  geom_node_point(aes(fill = party), shape = 21) +
  scale_fill_manual(values = c(Republican = "red", Democrat = "blue")) +
  scale_edge_width_continuous(range = c(0.1,0.7)) +
  ggtitle("Agreement over abortion bills")
## Using `stress` as default layout

Themes

Themes change the overall appearance of the plot. However, in the case of networks, you do not really need to set arguments. The most important ones might be to remove the legend with legend.position = "none", adding a title with and a footnote (e.g., containing information on how the data were acquired) with labs():

ggraph(abortion_graph) +
  geom_edge_link0(aes(edge_width = weight), edge_color = "grey66") +
  geom_node_point(aes(fill = party), shape = 21) +
  scale_fill_manual(values = c(Republican = "red", Democrat = "blue")) +
  scale_edge_width_continuous(range = c(0.1,0.7)) +
  labs(title = "Agreement over abortion bills",
       caption = "Data acquired using RVoteview and Infoview") +
  theme_graph() +
  theme(legend.position = "none")
## Using `stress` as default layout

Facets

If you want to split up your graphs according to different characteristics, you can use facet_graph(edge_characteristic ~ node_characteristic).

ggraph(abortion_graph) +
  geom_edge_link0(aes(edge_width = weight), edge_color = "grey66") +
  geom_node_point(aes(fill = party), shape = 21) +
  facet_graph(weight ~ gender)
## Using `stress` as default layout

If you only want to split it up according to different node characteristics, use facet_nodes(node_characteristic_1 ~ node_characteristic_2). The first argument is spread out to rows, the second one to columns. If you only want to split it up with one argument, insert only one (and place the tilde accordingly).

ggraph(abortion_graph) +
  geom_edge_link0(aes(edge_width = weight), edge_color = "grey66") +
  geom_node_point(aes(fill = party), shape = 21) +
  facet_nodes(party ~ gender)
## Using `stress` as default layout

ggraph(abortion_graph) +
  geom_edge_link0(aes(edge_width = weight), edge_color = "grey66") +
  geom_node_point(aes(fill = party), shape = 21) +
  facet_nodes(~ gender)
## Using `stress` as default layout

The same applies to edges – just use facet_edges(edge_characteristic_1 ~ edge_characteristic_2.

Further readings

References

Wickham, Hadley. 2010. “A Layered Grammar of Graphics.” Journal of Computational and Graphical Statistics 19(1):3–28.


  1. Colors in R can be found [here(http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf)]. Another option is scale_fill_brewer() and scale_color_brewer(). The function offers all palettes available at <colorbrewer2.org>.